[compiler-v2] Improved flush writes optimization #15718

vineethk · 2025-01-13T18:09:48Z

Description

Please note that we should not merge this PR until it undergoes comparison testing.

This PR contains a couple of optimizations, that together bring about 0.5% reduction in code size when compiling the aptos-framework and its dependencies (and compiler v2 produces 4% less code compared to compiler v1). They also address some transactions where v2 generated code took more gas than v1 generated code.

We now do peephole optimizations even when inline specs are present. This is accomplished by remapping the code offsets to spec mapping based on the remapping returned by the peephole optimizer.
We consider the usages of temps within an instruction and usages of temps between different instructions. If these usages are such that their definitions are out of order, then we hint at flushing those temps. Note that the flush writes optimization only suggests which writes should be flushed to locals right away, so in the worst case, it would produce additional stores and copy/moves.

As an example, for the Move code below:

fun test() {
    let a = one();
    consume(one(), a, one(), one(), one(), one());
}

Compiler v2 currently produces:

0: Call one(): u64
1: StLoc[0](loc0: u64)
2: Call one(): u64
3: Call one(): u64
4: Call one(): u64
5: Call one(): u64
6: Call one(): u64
7: StLoc[1](loc1: u64)
8: StLoc[2](loc2: u64)
9: StLoc[3](loc3: u64)
10: StLoc[4](loc4: u64)
11: MoveLoc[0](loc0: u64)
12: MoveLoc[4](loc4: u64)
13: MoveLoc[3](loc3: u64)
14: MoveLoc[2](loc2: u64)
15: MoveLoc[1](loc1: u64)
16: Call consume(u64, u64, u64, u64, u64, u64)
17: Ret

With this PR, compiler v2 produces:

0: Call one(): u64
1: StLoc[0](loc0: u64)
2: Call one(): u64
3: MoveLoc[0](loc0: u64)
4: Call one(): u64
5: Call one(): u64
6: Call one(): u64
7: Call one(): u64
8: Call consume(u64, u64, u64, u64, u64, u64)
9: Ret

How Has This Been Tested?

Several new tests.
Existing tests baselines have been checked and verified. We either produce less code or the same amount (except in the case of third_party/move/move-compiler-v2/tests/variable-coalescing/swap.opt.exp where we produce 2 additional instructions).

Key Areas to Review

The specification remapping due to peephole optimization
Flush writes optimization and the correctness argument above in point 2.

Type of Change

Performance improvement

Which Components or Systems Does This Change Impact?

Move Compiler

trunk-io · 2025-01-13T18:09:53Z

⏱️ 48m total CI duration on this PR

Job	Cumulative Duration	Recent Runs
rust-targeted-unit-tests	21m	🟩
rust-cargo-deny	11m	🟩 🟩 🟩 🟩 🟩 (+1 more)
check-dynamic-deps	9m	🟩 🟩 🟩 🟩 🟩 (+1 more)
general-lints	3m	🟩 🟩 🟩 🟩 🟩 (+1 more)
semgrep/ci	2m	🟩 🟩 🟩 🟩 🟩 (+1 more)
file_change_determinator	1m	🟩 🟩 🟩 🟩 🟩 (+1 more)
permission-check	16s	🟩 🟩 🟩 🟩 🟩 (+1 more)
permission-check	14s	🟩 🟩 🟩 🟩 🟩 (+1 more)
check-branch-prefix	1s	🟩

🚨 1 job on the last run was significantly faster/slower than expected

Job	Duration	vs 7d avg	Delta
check-dynamic-deps	3m	2m

_{settings ⋅ feedback ⋅ docs ⋅ learn more about trunk.io}

vineethk · 2025-01-13T18:10:08Z

[compiler-v2] Showcase fixes for several old issues #15785
[compiler-v2] Improved flush writes optimization #15718 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

third_party/move/move-compiler-v2/src/pipeline/dead_store_elimination.rs

third_party/move/move-compiler-v2/src/pipeline/flush_writes_processor.rs

vineethk · 2025-01-14T16:45:31Z

@wrwg I believe I have addressed your comments, PTAL.

wrwg

LGTM, as far as I understand.

I have a general question: what could go wrong if the flush processor has a bug? Is my understanding correct that the code would be weird (unnecessary flushing, for example), but it would not have a semantical effect?

rahxephon89 · 2025-01-14T07:17:25Z

third_party/move/move-compiler-v2/tests/bytecode-generator/freeze_mut_ref.exp

-	23: MoveLoc[2](loc1: &mut u64)
-	24: WriteRef
-	25: Ret
+	11: LdU64(2)


This is an example where optimization makes the code size bigger?

Ah yes, this and swap.opt.exp are the two minor regressions.

third_party/move/move-compiler-v2/src/pipeline/flush_writes_processor.rs

vineethk · 2025-01-15T15:35:51Z

LGTM, as far as I understand.

I have a general question: what could go wrong if the flush processor has a bug? Is my understanding correct that the code would be weird (unnecessary flushing, for example), but it would not have a semantical effect?

@wrwg Yes, like I mentioned in the PR description, in the worst case (say, due to a bug), we mark additional/fewer temps to be flushed, then it can result in more StLoc-copy/moves than necessary, but the overall semantics itself should not change. The flush marks added by this pass can only cause a temp added to a stack to be flushed to a local right away, instead of retaining it on the stack.

github-actions · 2025-01-29T17:18:57Z

✅ Forge suite `compat` success on `55909aa300b714e37df1113a239332c334cb2896` ==> `45f6387456d8a64717faf5c83f5d79b9b32010ea`

Compatibility test results for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea (PR)
1. Check liveness of validators at old version: 55909aa300b714e37df1113a239332c334cb2896
compatibility::simple-validator-upgrade::liveness-check : committed: 12489.70 txn/s, latency: 2491.18 ms, (p50: 2500 ms, p70: 2600, p90: 2800 ms, p99: 3100 ms), latency samples: 411480
2. Upgrading first Validator to new version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 4539.24 txn/s, latency: 6875.18 ms, (p50: 7500 ms, p70: 7900, p90: 8600 ms, p99: 8800 ms), latency samples: 93500
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 4366.25 txn/s, latency: 7770.82 ms, (p50: 8700 ms, p70: 8700, p90: 8900 ms, p99: 9100 ms), latency samples: 153080
3. Upgrading rest of first batch to new version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 4140.31 txn/s, latency: 7548.45 ms, (p50: 8300 ms, p70: 8900, p90: 9400 ms, p99: 9500 ms), latency samples: 90500
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 4153.88 txn/s, latency: 8190.35 ms, (p50: 9200 ms, p70: 9400, p90: 9600 ms, p99: 9700 ms), latency samples: 146020
4. upgrading second batch to new version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 7907.16 txn/s, latency: 3892.16 ms, (p50: 4400 ms, p70: 4900, p90: 5200 ms, p99: 5200 ms), latency samples: 145380
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 7902.21 txn/s, latency: 4327.74 ms, (p50: 4700 ms, p70: 4800, p90: 5000 ms, p99: 5200 ms), latency samples: 264100
5. check swarm health
Compatibility test for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea passed
Test Ok

github-actions · 2025-01-29T17:19:42Z

✅ Forge suite `realistic_env_max_load` success on `45f6387456d8a64717faf5c83f5d79b9b32010ea`

two traffics test: inner traffic : committed: 14776.01 txn/s, submitted: 14776.06 txn/s, expired: 0.05 txn/s, latency: 2681.30 ms, (p50: 2700 ms, p70: 2700, p90: 3000 ms, p99: 3600 ms), latency samples: 5618280
two traffics test : committed: 99.97 txn/s, latency: 1414.99 ms, (p50: 1400 ms, p70: 1400, p90: 1600 ms, p99: 2700 ms), latency samples: 1760
Latency breakdown for phase 0: ["MempoolToBlockCreation: max: 1.501, avg: 1.413", "ConsensusProposalToOrdered: max: 0.303, avg: 0.292", "ConsensusOrderedToCommit: max: 0.412, avg: 0.397", "ConsensusProposalToCommit: max: 0.708, avg: 0.688"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.75s no progress at version 2974571 (avg 0.19s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.62s no progress at version 2378301 (avg 0.62s) [limit 16].
Test Ok

github-actions · 2025-01-29T17:19:53Z

✅ Forge suite `framework_upgrade` success on `55909aa300b714e37df1113a239332c334cb2896` ==> `45f6387456d8a64717faf5c83f5d79b9b32010ea`

Compatibility test results for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea (PR)
Upgrade the nodes to version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1616.90 txn/s, submitted: 1622.26 txn/s, failed submission: 5.36 txn/s, expired: 5.36 txn/s, latency: 1794.48 ms, (p50: 1600 ms, p70: 2000, p90: 2700 ms, p99: 3600 ms), latency samples: 144921
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1572.03 txn/s, submitted: 1578.65 txn/s, failed submission: 6.62 txn/s, expired: 6.62 txn/s, latency: 1779.39 ms, (p50: 1800 ms, p70: 2000, p90: 2600 ms, p99: 3900 ms), latency samples: 142460
5. check swarm health
Compatibility test for 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea passed
Upgrade the remaining nodes to version: 45f6387456d8a64717faf5c83f5d79b9b32010ea
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1850.35 txn/s, submitted: 1857.11 txn/s, failed submission: 6.41 txn/s, expired: 6.76 txn/s, latency: 1574.59 ms, (p50: 1400 ms, p70: 1800, p90: 2500 ms, p99: 3400 ms), latency samples: 167548
Test Ok

vineethk changed the title ~~[compiler-v2] Experimental flush writes, second take~~ [compiler-v2] Improved flush writes optimization Jan 13, 2025

vineethk force-pushed the vk/experimental-flush-writes-2 branch 4 times, most recently from 1603bb7 to dedeb67 Compare January 13, 2025 21:14

vineethk marked this pull request as ready for review January 13, 2025 21:33

vineethk requested review from rahxephon89, wrwg and fEst1ck January 13, 2025 21:34

wrwg reviewed Jan 14, 2025

View reviewed changes

vineethk force-pushed the vk/experimental-flush-writes-2 branch from dedeb67 to f64e874 Compare January 14, 2025 16:36

vineethk requested a review from wrwg January 14, 2025 16:38

wrwg approved these changes Jan 15, 2025

View reviewed changes

rahxephon89 approved these changes Jan 15, 2025

View reviewed changes

vineethk force-pushed the vk/experimental-flush-writes-2 branch from f64e874 to 14244c9 Compare January 22, 2025 01:56

vineethk mentioned this pull request Jan 22, 2025

[compiler-v2] Showcase fixes for several old issues #15785

Open

1 task

vineethk force-pushed the vk/experimental-flush-writes-2 branch from 14244c9 to 0c69be6 Compare January 28, 2025 15:54

vineethk enabled auto-merge (squash) January 28, 2025 16:12

This comment has been minimized.

Sign in to view

[compiler-v2] Experimental flush writes, second take

45f6387

vineethk force-pushed the vk/experimental-flush-writes-2 branch from 0c69be6 to 45f6387 Compare January 29, 2025 16:52

This comment has been minimized.

Sign in to view

vineethk merged commit dfc3444 into main Jan 29, 2025
46 checks passed

vineethk deleted the vk/experimental-flush-writes-2 branch January 29, 2025 17:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compiler-v2] Improved flush writes optimization #15718

[compiler-v2] Improved flush writes optimization #15718

vineethk commented Jan 13, 2025 •

edited

Loading

trunk-io bot commented Jan 13, 2025 •

edited

Loading

vineethk commented Jan 13, 2025 •

edited

Loading

vineethk commented Jan 14, 2025

wrwg left a comment

rahxephon89 Jan 14, 2025

vineethk Jan 15, 2025

vineethk commented Jan 15, 2025

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

github-actions bot commented Jan 29, 2025

[compiler-v2] Improved flush writes optimization #15718

[compiler-v2] Improved flush writes optimization #15718

Conversation

vineethk commented Jan 13, 2025 • edited Loading

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

Which Components or Systems Does This Change Impact?

trunk-io bot commented Jan 13, 2025 • edited Loading

vineethk commented Jan 13, 2025 • edited Loading

vineethk commented Jan 14, 2025

wrwg left a comment

Choose a reason for hiding this comment

rahxephon89 Jan 14, 2025

Choose a reason for hiding this comment

vineethk Jan 15, 2025

Choose a reason for hiding this comment

vineethk commented Jan 15, 2025

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Jan 29, 2025

✅ Forge suite compat success on 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea

github-actions bot commented Jan 29, 2025

✅ Forge suite realistic_env_max_load success on 45f6387456d8a64717faf5c83f5d79b9b32010ea

github-actions bot commented Jan 29, 2025

✅ Forge suite framework_upgrade success on 55909aa300b714e37df1113a239332c334cb2896 ==> 45f6387456d8a64717faf5c83f5d79b9b32010ea

vineethk commented Jan 13, 2025 •

edited

Loading

trunk-io bot commented Jan 13, 2025 •

edited

Loading

vineethk commented Jan 13, 2025 •

edited

Loading

✅ Forge suite `compat` success on `55909aa300b714e37df1113a239332c334cb2896` ==> `45f6387456d8a64717faf5c83f5d79b9b32010ea`

✅ Forge suite `realistic_env_max_load` success on `45f6387456d8a64717faf5c83f5d79b9b32010ea`

✅ Forge suite `framework_upgrade` success on `55909aa300b714e37df1113a239332c334cb2896` ==> `45f6387456d8a64717faf5c83f5d79b9b32010ea`